Secret softmax function calculation system, secret softmax calculation apparatus, secret softmax calculation method, secret neural network calculation system, secret neural network learning system, and program

ABSTRACT

Techniques for performing secure computing of softmax functions at high speed and with high accuracy are provided. A secure softmax function calculation system that calculates a share ([[softmax (u1)]], . . . , [[softmax (uJ)]]) from a share ([[u1]], . . . , [[uJ]]) includes a subtraction means for calculating a share ([[u1−u1]], [[u2−u1]], . . . , [[uJ−uJ]]), a first secure batch mapping calculation means for calculating, [[exp (u1−u1)]], [[exp (u2−u1)]], . . . , [[exp (uJ−uJ)]], an addition means for calculating a share ([[Σj=1J exp (uj−u1)]], . . . , [[Σj=1J exp (uj−uJ)]], and a second secure batch mapping calculation means for calculating a share ([[softmax (u1)], . . . , [[softmax (uJ)]]).

TECHNICAL FIELD

The present invention relates to secure computing techniques, and particularly relates to techniques for secure computing of softmax functions.

BACKGROUND ART

Conventional methods for secure computing of softmax functions include SecureML described in NPL 1 and SecureNN described in NPL 2. Here, softmax functions are non-linear functions represented by the following equation.

$\begin{matrix} \left\lbrack {{Math}.\mspace{14mu} 1} \right\rbrack & \; \\ {{{softmax}\left( u_{i} \right)} = \frac{\exp\left( u_{i} \right)}{\sum_{j - 1}^{J}{\exp\left( u_{j} \right)}}} & (1) \end{matrix}$

Secure computing is a method for obtaining an arithmetic result of a specified operation without restoring encrypted numerical values (for example, see Reference NPL 1). In the method of Reference NPL 1, an encryption is performed to distribute a plurality of pieces of information that can restore a numerical value to three secure computing devices, and results of addition and subtraction, constant sum, multiplication, constant times, logic operations (NOT, AND, OR, XOR), data format conversions (integer number, binary number) can be kept distributed or kept encrypted in the three secure computing devices without restoring the numerical value. Generally, the number of distribution is not limited to 3 but can be W (W is a predetermined constant greater than or equal to 3), and a protocol that implements secure computing by coordinated calculation with W secure computing devices is referred to as a multi-party protocol. (Reference NPL 1: Koji Chida, Koki Hamada, Dai Ikarashi, Katsumi Takahashi, “A Lightweight Three-party Secure Function Evaluation with Error Detection and Its Experimental Result,” In CSS, 2010.)

CITATION LIST Non Patent Literature

-   NPL 1: Payman Mohassel and Yupeng Zhang, “SecureML: A System for     Scalable Privacy-Preserving Machine Learning”, 2017 IEEE Symposium     on Security and Privacy, pp. 19-38, 2017. -   NPL 2: Sameer Wagh, Divya Gupta, and Nishanth Chandran, “SecureNN:     3-Party Secure Computation for Neural Network Training”, Proceedings     on Privacy Enhancing Technologies; 2019 (3): 26-49, 2019.

SUMMARY OF THE INVENTION Technical Problem

However, as can be seen from Equation (1), softmax functions include calculation of exponential functions and divisions that secure computing is not good at, so it is not easy to perform secure computing while ensuring both processing speed and accuracy. Conventionally, a function ReLU (x)=max (0, x) is used for the approximation of an exponential function exp (x), so the approximation accuracy is low, and in particular, the greater the value of x is, the greater the error is, and the lower the accuracy of the calculation of softmax functions is.

Thus, an object of the present invention is to provide techniques for performing secure computing of softmax functions at high speed and with high accuracy.

Means for Solving the Problem

An aspect of the present invention is a secure softmax function calculation system for calculating a share ([[softmax (u₁)]], . . . , [[softmax (u_(J))]]) of a value (softmax (u₁), . . . , softmax (u_(J))) of a softmax function for an input vector (u₁, . . . , u_(J)) from a share ([[u₁]], . . . , [[u_(J)]]) of the input vector (u₁, . . . , u_(J)) (where J is an integer greater than or equal to 1), the secure softmax function calculation system being constituted with three or more secure softmax function calculation apparatuses, map₁ being secure batch mapping defined by a parameter (a₁, . . . , a_(K)) representing a domain of definition and a parameter (α₁, . . . , α_(K)) representing a range of values (where K is an integer greater than or equal to 2, a₁, . . . , a_(K) are real numbers that meet a₁< . . . , <a_(K)) of a function exp (x), and map₂ being secure batch mapping defined by a parameter (b₁, . . . , b_(L)) representing a domain of definition and a parameter (β₁, . . . , β_(L)) representing a range of values (where L is an integer greater than or equal to 2, b₁, . . . , b_(L) are real numbers that meet b₁< . . . <b_(L)) of a function 1/x, the secure softmax function calculation system including: a subtraction means for calculating a share ([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]) from the share ([[u₁]], . . . , [[u_(J)]]); a first secure batch mapping calculation means for calculating map₁ (([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]))=([[α_(f(1, 1))]], [[α_(f(2, 1))]], . . . , [[α_(f(J, 1))]], [[α_(f(1, 2))]], . . . , [[α_(f(J, 2))]], . . . , [[α_(f(1, J))]], . . . , [[α_(f(J, J))]]) from the share ([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]) (where f(i, j) (1≤i, j≤K) is p where a_(p)≤u_(i)−u_(j)<a_(p+1)) to make ([[exp (u₁−u₁)]], [[exp (u₂−u₁)]], . . . , [[exp (u_(J)−u₁)]], [[exp (u₁−u₂)]], . . . , [[exp (u_(J)−u₂)]], . . . , [[exp (u₁−u_(J))]], . . . , [[exp (u_(J)−u_(J))]])=([[α_(f(1, 1))]], [[α_(f(2, 1))]], . . . , [[α_(f(J, 1))]], [[α_(f(1, 2))]], . . . , [[α_(f(J, 2))]], . . . , [[α_(f(1, J))]], . . . , [[α_(f(J, J))]]); an addition means for calculating a share ([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]) from the share ([[exp (u₁−u₁)]], [[exp (u₂−u₁)]], . . . , [[exp (u_(J)−u₁)]], [[exp (u₁−u₂)]], . . . , [[exp (u_(J)−u₂)]], . . . , [[exp (u₁−u_(J))]], . . . , [[exp (u_(J)−u_(J))]]); and a second secure batch mapping calculation means for calculating map₂ (([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]))=([[β_(g(1))]], [[β_(g(2))]], . . . , [[β_(g(J))]]) from the share ([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]) (where g(i) (1≤i≤L) is p where b_(p)≤Σ_(j=1) ^(J) exp (u_(j)−u_(i))<b_(p+1)) to make ([[softmax (u₁)]], [[softmax (u₂)]], . . . , [[softmax (u_(J))]])=([[β_(g(1))]], [[β_(g(2))]], . . . , [[β_(g(J))]]).

Effects of the Invention

According to the present invention, it is possible to perform secure computing of softmax functions at high speed and with high accuracy.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram illustrating a configuration of a secure softmax function calculation system 10.

FIG. 2 is a block diagram illustrating a configuration of a secure softmax function calculation apparatus 100 _(i).

FIG. 3 is a flowchart illustrating an operation of the secure softmax function calculation system 10.

FIG. 4 is a block diagram illustrating a configuration of a secure neural network calculation system 20.

FIG. 5 is a block diagram illustrating a configuration of a secure neural network calculation apparatus 200 _(i).

FIG. 6 is a flowchart illustrating an operation of the secure neural network calculation system 20.

FIG. 7 is a block diagram illustrating a configuration of an output layer calculation unit 230 _(i).

FIG. 8 is a flowchart illustrating an operation of an output layer calculation means 230.

FIG. 9 is a block diagram illustrating a configuration of a secure neural network learning system 30.

FIG. 10 is a block diagram illustrating a configuration of a secure neural network learning apparatus 300 _(i).

FIG. 11 is a flowchart illustrating an operation of the secure neural network learning system 30.

FIG. 12 is a block diagram illustrating a configuration of a forward propagation calculation unit 310 _(i).

FIG. 13 is a flowchart illustrating an operation of a forward propagation calculation means 310.

FIG. 14 is a diagram illustrating an example of a functional configuration of a computer achieving each apparatus according to the embodiments of the present invention.

DESCRIPTION OF EMBODIMENTS

Hereinafter, embodiments of the present invention will be described in detail.

Components having the same function are denoted by the same reference signs, and redundant description thereof will be omitted.

Prior to describing each embodiment, the method of notation herein will be described.

_ (underscore) represents the subscript. For example, x^(y_z) represents y_(z) is the superscript to x, and x_(y_z) represents y_(z) is the subscript to x.

A superscript “{circumflex over ( )}” or “˜”, such as {circumflex over ( )}x or ˜x to a character x, should be described otherwise above “x”, but are described as {circumflex over ( )}x or ˜x, under the limitations of the written description herein.

Technical Background <<Preparation>>

The secure computing in the embodiments of the invention of the present application are built up in combination of existing operations of secure computing. The operations required for the secure computing are concealment, addition, multiplication, secure batch mapping, and right shift. Each of the operations will be described below.

[Concealment]

[[x]] is a value concealed by secret sharing of x (hereinafter referred to as a share of x). Any method can be used for the secret sharing method. For example, Shamir secret sharing on GF (2⁶¹−1), or replicated secret sharing on Z₂ may be used.

Multiple secret sharing methods may be used in combination in one algorithm. In this case, the components are converted to each other as appropriate.

For an N-dimensional vector {right arrow over ( )}x=(x₁, . . . , x_(N)), [[{right arrow over ( )}x]]:=([[x₁]], . . . , [[x_(N)]]). In other words, [[{right arrow over ( )}x]] is a vector where the share [[x_(n)]] of the n-th element x_(n) of {right arrow over ( )}x is the n-th element. Similarly, for M×N matrix A=(a_(m, n)) (1≤m≤M, 1≤n≤N), [[A]] is a matrix where the share [[a_(m, N)]] of the (m, n) element a_(m, n) of A is the element (m, n). Here, a:=b represents that a is defined by b.

Note that x is a plaintext of [[x]].

Methods for determining [[x]] from x (concealment) and methods for determining x from [[x]] (restoration) specifically include methods described in Reference NPL 1 and Reference NPL 2.

-   (Reference NPL 2: Shamir, A., “How to share a secret”,     Communications of the ACM, Vol. 22, No. 11, pp. 612-613, 1979.)

[Addition and Multiplication]

An addition [[x]]+[[y]] by secure computing outputs [[x+y]] with [[x]] and [[y]] as inputs. A multiplication [[x]]× [[y]] (also represented as mul ([[x]], [[y]]) by secure computing outputs [[x×y]] with [[x]] and [[y]] as inputs.

Note that any of [[x]] and [[y]] may be a value not concealed (hereinafter referred to as a “public value”). For example, with β and γ as public values, [[x+β]] can be output with [[x]] and β as inputs, or [[γ×y]] can be output with γ and [[y]] as inputs.

Specific methods of addition and multiplication include methods described in Reference NPL 3 and Reference NPL 4.

-   (Reference NPL 3: Ben-Or, M., Goldwasser, S. and Wigderson, A.,     “Completeness theorems for non-cryptographic fault-tolerant     distributed computation”, Proceedings of the twentieth annual ACM     symposium on Theory of computing, ACM, pp. 1-10, 1988.) -   (Reference NPL 4: Gennaro, R., Rabin, M. O. and Rabin, T., “Simplied     VSS and fast-track multiparty computations with applications to     threshold cryptography”, Proceedings of the seventeenth annual ACM     symposium on Principles of distributed computing, ACM, pp. 101-111,     1998.)

[Secure Batch Mapping]

A secure batch mapping is a function of calculating a lookup table, which is a technique that can arbitrarily define the domain of definition and range of values of functions to be calculated. The secure batch mapping performs processing in a vector unit, so the secure batch mapping has a property that it is effective in performing the same processing on a plurality of inputs. The secure batch mapping is a function defined as follows.

The secure batch mapping map is to output a share that maps a share of each element of the vector, i.e., [[{right arrow over ( )}y]]:=([[y₁]], . . . , [[y_(N)]]) such that a_(p)≤x_(n)<a_(p+1) and y_(n)=b_(p) for 1≤n≤N, by using a parameter (a₁, . . . , a_(K)) representing a domain of definition and a parameter (b₁, . . . , b_(K)) representing a range of values of a function f(x) (where a₁, . . . , a_(K), b₁, . . . , b_(K) are real numbers, and meet a₁< . . . <a_(K)), with a share [[{right arrow over ( )}x]]=([[x₁]], . . . , [[x_(N)]]) of a vector {right arrow over ( )}x=(x₁, . . . , x_(N)) as an input.

For example, the algorithm of the secure batch mapping described in NPL 5 can be used.

-   (Reference NPL 5: Koki Hamada, Dai Ikarashi, Koji Chida, “A Batch     Mapping Algorithm for Secure Function Evaluation,” IEICE     Transactions A, Vol. J96-A, No. 4, pp. 157-165, 2013.)

[Right Shift]

The right shift rshift is to output [[{right arrow over ( )}y]]:=([[y₁]], . . . , [[y_(N)]]) in which each element [[x_(n)]] of the share [[x{right arrow over ( )}]] is arithmetic right shifted by t bits, with the share [[{right arrow over ( )}x]]=([[x₁]], . . . , [[x_(N)]]) of the vector {right arrow over ( )}x=(x₁, . . . , x_(N)) and a public value t as inputs. Here, the arithmetic right shift is a shift that performs padding on the left side by code bits rather than 0. By using a logical right shift rlshift, rshift ([[A×2^(n)]], n−m)=[[A×2^(m)]] can be calculated as the following equation (see NPL 6).

┌┌A′×2^(n) ┐┐=┌┌A×2^(n) ┐┐+a×2^(n) (a≥|A|)

[[A′×2^(m)]]=rlshift([[A′×2^(n)]],n−m)

∥A×2^(m) ∥=∥A′×2^(m) ∥−a×2^(m)  [Math. 2]

-   (Reference NPL 6: Ibuki Mishina, Dai Ikarashi, Koki Hamada, Ryo     Kikuchi. “Designs and Implementations of Efficient and Accurate     Secret Logistic Regression,” In CSS, 2018.)

<<Secure Computing of Softmax Function by Secure Batch Mapping>>

In the embodiments of the invention of the present application, secure batch mapping is used to calculate softmax functions while keeping vector (u₁, . . . , u_(J)) concealed. Because softmax functions have a plurality of inputs as can be seen from Equation (1), softmax functions have good compatibility with the nature of secure batch mapping, which is processed in vector units. However, because the processing time of the secure batch mapping is proportional to the size of the lookup table (domain of definition or range of values), the calculation accuracy and processing speed are in a trade-off relationship.

Specifically, in the embodiments of the invention of the present application, a secure batch mapping map₁ that calculates a function exp (x) and a secure batch mapping map₂ that calculates a function 1/x are used to implement secure computing of softmax functions. To do so, instead of Equation (1), the following equation is considered by dividing both the denominator and the numerator of Equation (1) by exp (u_(i)).

$\begin{matrix} \left\lbrack {{Math}.\mspace{14mu} 3} \right\rbrack & \; \\ {{{softmax}\left( u_{i} \right)} = \frac{\exp\left( u_{i} \right)}{\sum_{j - 1}^{J}{\exp\left( {u_{j} - u_{i}} \right)}}} & (1)^{\prime} \end{matrix}$

Then, a share [[softmax (u_(i))]] of a value softmax (u_(i)) of a softmax function is calculated by the following procedure.

(1) First, [[u_(j)−u_(i)]] is calculated for j=1, 2, . . . , J.

(2) Next, ([[u₁−u_(i)]], [[u₂−u_(i)]], . . . , [[u_(J)−u_(i)]] is input to the secure batch mapping map₁, and ([[exp (u₁−u_(i))], [[exp (u₂−u_(i))], . . . , [[exp (u_(J)−u_(i))]) is obtained as output results.

(3) [[Σ_(j=1) ^(J) exp (u_(j)−u_(i))]] is obtained by adding all the output results of (2) ([[exp (u₁−u_(i))], [[exp (u₂−u_(i))], . . . , [[exp (u_(J)−u_(i))]).

(4) Finally, addition results [[Σ_(j=1) ^(J) exp (u_(j)−u_(i))]] is input to the secure batch mapping map₂, and [[softmax (u_(i))]] is obtained.

Here, in a case where the processing of (1) to (4) described above is performed on all i=1, 2, . . . , J, by the secure batch mapping map₁ and the secure batch mapping map₂, ([[exp (u₁−u₁)]], [[exp (u₂−u₁)]], . . . , [[exp (u_(J)−u₁)]], [[exp (u₁−u₂)]], . . . , [[exp (u_(J)−u₂)]], . . . , [[exp (u₁−u_(J))]], [[exp (u_(J)−u_(J))]]) and [[1/Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[1/Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[1/Σ_(j=1) ^(J) exp (u_(j)−u_(J))]] can be calculated together, and the calculation of ([[softmax (u₁)]], . . . , [[softmax (u_(J))]] can be implemented very efficiently. In other words, by the computation of the two secure batch mapping, the secure computing of the softmax function that calculates ([[softmax (u₁)]], . . . , [[softmax (u_(J))]]) from ([[u₁]], . . . , [[u_(J)]]) can be implemented.

<<Devise on Design of LookUp Table>>

As described above, the calculation accuracy and the processing speed of softmax functions are in a trade-off relationship. Thus, in order to increase the processing speed while maintaining a degree of calculation accuracy, a lookup table of an appropriate size needs to be designed, i.e., parameters representing the domain of definition and parameters representing the range of values need to be set. Thus, for the appropriate design of the lookup table, the nature of “softmax function satisfying 0≤softmax (u_(i))≤1” is utilized.

From the above nature, it can be seen that the range of values of the secure batch mapping map₂ that calculates the function 1/x is [0, 1], and thus a lookup table where the range of values of the secure batch mapping map₂ is [0, 1] is to be created. Thus, in a case where fixed point arithmetic is used to perform secure computing, and where by is the number of bits in the fractional portion of the fixed point representing the accuracy of the output of the softmax function, the size of the lookup table is 2^(b_y).

At this time, the minimum value of the output that can be represented by the accuracy b_(y) bits is ½^(b_y), and the input at that time is 2^(b_y). Thus, it can be seen that the maximum value of the output of the secure batch mapping map₁ that calculates the function exp (x) is 2^(b_y). This is because even if a larger value can be calculated, the value obtained by inputting the value into the secure batch mapping map₂ that calculates the function 1/x is the same as the case where 2^(b_y) is input. Thus, the range of values of the secure batch mapping map₁ that calculates the function exp (x) is [0, 2^(b_y)].

In this way, by utilizing the nature that the value of the softmax function is limited to [0, 1], the embodiments of the invention of the present application can design a lookup table of an appropriate size, and can efficiently perform secure computing of a softmax function using secure batch mapping.

First Embodiment

A secure softmax function calculation system 10 will be described below with reference to FIGS. 1 to 3. FIG. 1 is a block diagram illustrating a configuration of the secure softmax function calculation system 10. The secure softmax function calculation system 10 includes W (where W is a predetermined integer greater than or equal to 3) secure softmax function calculation apparatuses 100 ₁, . . . , 100 _(W). The secure softmax function calculation apparatuses 100 ₁, . . . , 100 _(W) are connected to a network 800 and are capable of communicating with each other. The network 800 may be, for example, a communication network such as the Internet or a broadcast communication path. FIG. 2 is a block diagram illustrating a configuration of a secure softmax function calculation apparatus 100 _(i) (1≤i≤W). FIG. 3 is a flowchart illustrating an operation of the secure softmax function calculation system 10.

As illustrated in FIG. 2, the secure softmax function calculation apparatus 100 _(i) includes a subtraction unit 110 _(i), a first secure batch mapping calculation unit 120 _(i), an addition unit 130 _(i), a second secure batch mapping calculation unit 140 _(i), and a recording unit 190 _(i). Each of the components of the secure softmax function calculation apparatus 100 _(i) excluding the recording unit 190 _(i) is configured such that operations required for the secure computing, specifically, operations required to implement functions of each of the components among at least concealment, addition, multiplication, secure batch mapping, and right shift, can be executed. Specific functional configurations for implementing individual operations in the present invention are sufficient to be configurations such that the algorithms disclosed in, for example, each of the reference NPL 1 to 6 can be executed, and these are conventional configurations, so the detailed description thereof will be omitted. The recording unit 190 _(i) is a component that records information required for processing executed by the secure softmax function calculation apparatus 100 _(i). For example, the recording unit 190 _(i) records a parameter (a₁, . . . , a_(K)) representing a domain of definition and a parameter (α₁, . . . , α_(K)) representing a range of values (where K is an integer greater than or equal to 2, a₁, . . . , a_(K) are real numbers that meet a₁< . . . <a_(K)) of a function exp (x) required for definition of a secure batch mapping map₁, and a parameter (b₁, . . . , b_(L)) representing a domain of definition and a parameter (β₁, . . . , β_(L)) representing a range of values (where L is an integer greater than or equal to 2, b₁, . . . , b_(L) are real numbers that meet b₁< . . . <b_(L)) of a function 1/x required for definition of a secure batch mapping map₂.

By way of a coordinate calculation by the W secure softmax function calculation apparatuses 100 _(i), the secure softmax function calculation system 10 implements secure computing of softmax functions, which is a multi-party protocol. Thus, a subtraction means 110 (not illustrated) of the secure softmax function calculation system 10 is constituted with subtraction units 110 ₁, . . . , 110 _(W). A first secure batch mapping calculation means 120 (not illustrated) is constituted with first secure batch mapping calculation units 120 ₁, . . . , 120 _(W). An addition means 130 (not illustrated) is constituted with addition units 130 ₁, . . . , 130 _(W). A second secure batch mapping calculation means 140 (not illustrated) is constituted with second secure batch mapping calculation units 140 ₁, . . . , 140 _(W).

The secure softmax function calculation system 10 calculates a share ([[softmax (u₁)]], . . . , [[softmax (u_(J))]]) of a value (softmax (u₁), . . . , softmax (u_(J))) of a softmax function for an input vector (u_(J), . . . , u_(J)) from a share ([[u₁]], . . . , [[u_(J)]]) of the input vector (u₁, . . . , u_(J)) (where J is an integer greater than or equal to 1). The operation of the secure softmax function calculation system 10 will be described below in accordance with FIG. 3.

At S110, the subtraction means 110 calculates a share ([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]] from the share ([[u₁]], . . . , [[u_(J)]]).

At S120, the first secure batch mapping calculation means 120 calculates map₁ (([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]))=([[α_(f(1, 1))]], [[α_(f(2, 1))]], . . . , [[α_(f(J, 1))]], [[α_(f(1, 2))]], . . . , [[α_(f(J, 2))]], . . . , [[α_(f(1, J))]], . . . , [[α_(f(J, J))]]) from the share ([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]) (where f(i, j) (1≤i, j≤K) is p where a_(p)≤u_(i)−u_(j)<a_(p+1)) to make ([[exp (u₁−u₁)]], [[exp (u₂−u₁)]], . . . , [[exp (u_(J)−u₁)]], [[exp (u₁−u₂)]], . . . , [[exp (u_(J)−u₂)]], . . . , [[exp (u₁−u_(J)]], . . . , [[exp (u_(J)−u_(J)]])=([[α_(f(1, 1))]], [[α_(f(2, 1))]], . . . , [[α_(f(J, 1))]], [[α_(f(1, 2))]], . . . , [[α_(f(J, 2))]], . . . , [[α_(f(1, J))]], . . . , [[α_(f(J, J))]]).

At S130, the addition means 130 calculates a share ([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]) from the share ([[exp (u₁−u₁)]], [[exp (u₂−u₁)]], . . . , [[exp (u_(J)−u₁)]], [[exp (u₁−u₂)]], . . . , [[exp (u_(J)−u₂)]], . . . , [[exp (u₁−u_(J))]], . . . , [[exp (u_(J)−u_(J))]]).

At S140, the second secure batch mapping calculation means 140 calculates map₂ (([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]))=([[β_(g(1))]], [[β_(g(2))]], . . . , [[β_(g(J))]]) from the share ([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]) (where g(i) (1≤i≤L) is p where b_(p)≤Σ_(j=1) ^(J) exp (u_(j)−u_(i))<b_(p+1)) to make ([[softmax (u₁)]], [[softmax (u₂)]], . . . , [[softmax (u_(J))]])=([[β_(g(1))]], [[β_(g(2))]], . . . , [[β_(g(J))]]).

(Setting Example of Domain of Definition and Range of Values of Secure Batch Mapping) As described in Technical Background, in the case of secure computing using fixed point arithmetic, the parameters α₁ and α_(K) representing the range of values of the secure batch mapping map₁ can be taken as α₁=0 and α_(K)=2^(b_y), respectively (where b_(y) represents the number of bits in the fractional portion of the fixed point representing the accuracy of the output of the softmax function).

By setting in this way, recording wasteful ranges of values that are not used for the secure computing can be prevented, and search in lookup tables in the calculation of the secure batch mapping also gets faster.

According to the embodiment of the present invention, it is possible to perform secure computing of softmax functions at high speed and with high accuracy. Here, the domain of definition and the range of values of the secure batch mapping map₁ and map₂ can be arbitrarily set, and thus can be determined according to the required accuracy and processing speed. In the embodiment of the present invention, unlike conventional methods in which approximation is made to exponential functions, any precision can be set, so secure computing of softmax functions with accuracy that is comparable to plaintext is possible.

Second Embodiment

A secure neural network calculation system 20 will be described below with reference to FIGS. 4 to 6. FIG. 4 is a block diagram illustrating a configuration of the secure neural network calculation system 20. The secure neural network calculation system 20 includes W′ (W′ is a predetermined integer greater than or equal to 3) secure neural network calculation apparatuses 200 ₁, . . . , 200 _(W′). The secure neural network calculation apparatuses 200 ₁, . . . , 200 _(W′) are connected to a network 800 and are capable of communicating with each other. The network 800 may be, for example, a communication network such as the Internet or a broadcast communication path. FIG. 5 is a block diagram illustrating a configuration of the secure neural network calculation apparatus 200 _(i) (1≤i≤W′). FIG. 6 is a flowchart illustrating an operation of the secure neural network calculation system 20.

As illustrated in FIG. 5, the secure neural network calculation apparatus 200 _(i) includes an input layer calculation unit 210 _(i), an n-th layer calculation unit 220-n _(i) (n=1, . . . , N−1, N is an integer greater than or equal to 2, and N−1 represents the number of hidden layers (intermediate layers)), an output layer calculation unit 230 _(i), and a recording unit 190 _(i). Each of the components of the secure neural network calculation apparatus 200 _(i) excluding the recording unit 190 _(i) is configured such that operations required for the secure computing, specifically, operations required to implement functions of each of the components among at least concealment, addition, multiplication, secure batch mapping, and right shift, can be executed. Specific functional configurations for implementing individual operations in the present invention are sufficient to be configurations such that the algorithms disclosed in, for example, each of the reference NPL 1 to 6 can be executed, and these are conventional configurations, so the detailed description thereof will be omitted. The recording unit 190 _(i) is a component that records information required for processing executed by the secure neural network calculation apparatus 200 _(i).

By way of a coordinate calculation by the W′ secure neural network calculation apparatuses 200 _(i), the secure neural network calculation system 20 implements secure computing by the neural network, which is a multi-party protocol. Thus, an input layer calculation means 210 (not illustrated) of the secure neural network calculation system 20 is constituted with input layer calculation units 210 ₁, . . . , 210 _(W′). An n-th layer calculation means 220-n (n=1, . . . , N−1) (not illustrated) is constituted with n-th layer calculation units 220-n ₁, . . . , 220-n _(W′). An output layer calculation means 230 (not illustrated) is constituted with output layer calculation units 230 ₁, . . . , 230 _(W′).

The secure neural network calculation system 20 calculates a share [[Y^(N+1)]] of an output value Y^(N+1) (N is an integer greater than or equal to 2) for input data X from a share [[X]] of the input data X. The operation of the secure neural network calculation system 20 will be described below in accordance with FIG. 6. Here, the input data X, an output value Y¹ of an input layer, an output value Y^(n+1) of an n-th layer (n=1, . . . , N−1), and an output value Y^(N+1) of an output layer are represented by vectors, and a parameter W⁰ of the input layer, a parameter W^(n) (n=1, . . . , N−1) of the n-th layer, and a parameter W^(N) of the output layer are represented by matrices. Note that a share [[W⁰]] of the parameter W⁰ of the input layer, a share [[W^(n)]] of the parameter W^(n) of the n-th layer (n=1, . . . , N−1), and a share [[W^(N)]] of the parameter W^(N) of the output layer may be recorded in advance in the recording unit 190 _(i) (1≤i≤W′), for example.

At S210, the input layer calculation means 210 calculates a share [[Y¹]] of an output value Y¹ of an input layer from a share [[X]] by using a share [[W⁰]] of a parameter W⁰ of an input layer. The input layer calculation means 210 calculates the share [[Y¹]], for example, according to the following equation.

[Math. 4]

[[U ¹]]←[[W ⁰]][[X]]  (a-1)

└└Y ¹┘┘←Activation(└└U ¹┘┘)  (a-2)

Here, Activation represents an activation function. It is assumed that both the activation function Activation and its derivative Activation′ are both functions capable of secure computing. As the activation function, for example, a function ReLU (x) can be used. In this case, a derivative ReLU′(x) is given, for example, by the following equation.

$\begin{matrix} \left\lbrack {{{Mat}h}.\mspace{14mu} 5} \right\rbrack & \; \\ {{{ReLU}^{\prime}(x)} = \left\{ \begin{matrix} 0 & \left( {x \leq 0} \right) \\ 1 & \left( {x > 0} \right) \end{matrix} \right.} & \; \end{matrix}$

Note that U¹ is referred to as an intermediate output value of the input layer.

With n=1, . . . , N−1, at S220-n, the n-th layer calculation means 220-n calculates the share [[Y^(n+1)]] of the output value Y^(n+1) of the n-th layer from the share [[Y^(n)]] by using the share [[W^(n)]] of the parameter W^(n) of the n-th layer. The n-th layer calculation means 220-n calculates the share [[Y^(n+1)]], for example, according to the following equation.

[Math. 6]

[[U ^(n+1)]]←[[W ^(n)]][[Y ^(n)]]  (b-1)

[[Y ^(n+1)]]←Activation([[U ^(n+1)]])  (b-2)

Note that U^(n+1) is referred to as an intermediate output value of the n-th layer.

At S230, the output layer calculation means 230 calculates the share [[Y^(N+1)]] of the output value Y^(N+1) of the output layer from the share [[Y^(N)]] by using the share [[W^(N)]] of the parameter W^(N) of the output layer. The output layer calculation means 230 calculates the share [[Y^(N+1)]], for example, according to the following equation.

[Math. 7]

[[U ^(N+1)]]←[[W ^(N)]][[Y ^(N)]]  (c-1)

└└Y ^(N+1)┘┘←softmax(└└U ^(N+1)┘┘)  (c-2)

Note that U^(N+1) is referred to as an intermediate output value of the output layer.

The output layer calculation means 230 will be described below with reference to FIGS. 7 to 8. FIG. 7 is a block diagram illustrating a configuration of the output layer calculation unit 230 _(i). FIG. 8 is a flowchart illustrating an operation of the output layer calculation means 230. A subtraction means 110 (not illustrated) of the output layer calculation means 230 is constituted with subtraction units 110 ₁, . . . , 110 _(W′). A first secure batch mapping calculation means 120 (not illustrated) is constituted with first secure batch mapping calculation units 120 ₁, . . . , 120 _(W′). An addition means 130 (not illustrated) is constituted with addition units 130 ₁, . . . , 130 _(W′). A second secure batch mapping calculation means 140 (not illustrated) is constituted with second secure batch mapping calculation units 140 ₁, . . . , 140 _(W′). An intermediate output calculation means 231 (not illustrated) is constituted with intermediate output calculation units 231 ₁, . . . , 231 _(W′). As illustrated in FIG. 7, the output layer calculation unit 230 _(i) includes a subtraction unit 110 _(i), a first secure batch mapping calculation unit 120 _(i), an addition unit 130 _(i), a second secure batch mapping calculation unit 140 _(i), and an intermediate output calculation unit 231 _(i).

The operation of the output layer calculation means 230 will be described in accordance with FIG. 8.

At S231, the intermediate output calculation means 231 calculates the share [[U^(N+1)]] of the intermediate output value U^(N+1) of the output layer from the share [[Y^(N)]] by using the share [[W^(N)]] of the parameter W^(N) of the output layer. The intermediate output calculation means 231 calculates the share [[U^(N+1)]], for example, according to Relationship (c-1).

Hereinafter, in the case of the intermediate output value U^(N+1)=(u₁, . . . , u_(J)) (where J is an integer greater than or equal to 1), the share [[U^(N+1)]]=([[u₁]], . . . , [[u_(J)]]) of the intermediate output value U^(N+1).

At S110, the subtraction means 110 calculates a share [[u₁−u₁]], [[u₂−u₂]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]) from the share ([[u₁]], . . . , [[u_(J)]]).

At S120, the first secure batch mapping calculation means 120 calculates map₁ (([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]))=([[α_(f(1, 1))]], [[α_(f(2, 1))]], . . . , [[α_(f(J, 1))]], [[α_(f(1, 2))]], . . . , [[α_(f(J, 2))]], . . . , [[α_(f(1, J))]], . . . , [[α_(f(J, J))]]) from the share ([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]) (where f(i, j) (1≤i, j≤K) is p where a_(p)≤u_(i)−u_(j)<a_(p+1)) to make ([[exp (u₁−u₁)]], [[exp (u₂−u₁)]], . . . , [[exp (u_(J)−u₁)]], [[exp (u₁−u₂)]], . . . , [[exp (u_(J)−u₂)]], . . . , [[exp (u₁−u_(J))]], . . . , [[exp (u_(J)−u_(J))]])=[[α_(f(1, 1))]], [[α_(f(2, 1))]], . . . , [[α_(f(J, 1))]], . . . , [[α_(f(1, 2))]], . . . , [[α_(f(J, 2))]], . . . , [[α_(f(1, J))]], . . . , [[α_(f(J, J))]]).

At S130, the addition means 130 calculates a share ([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]) from the share ([[exp (u₁−u₁)]], [[exp (u₂−u₁)]], . . . , [[exp (u_(J)−u₁)]], [[exp (u₁−u₂)]], . . . , [[exp (u_(J)−u₂)]], . . . , [[exp (u₁−u_(J))]], . . . , [[exp (u_(J)−u_(J))]]).

At S140, the second secure batch mapping calculation means 140 calculates map₂ (([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]))=([[β_(g(1))]], [[β_(g(2))]], . . . , [[β_(g(J))]]) from the share ([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]) (where g(i) (1≤i≤L) is p where b_(p)≤Σ_(j=1) ^(J) exp (u_(j)−u_(i))<b_(p+1)) to make[[Y^(N+1)]]=([[softmax (u₁)]], [[softmax (u₂)]], . . . , [[softmax (u_(J))]])=([[β_(g(1))]], [[β_(g(2))]], . . . , [[β_(g(J))]]).

According to the embodiment of the present invention, it is possible to perform secure computing by the neural network at high speed and with high accuracy. By using the first embodiment, softmax functions can be calculated with high accuracy, so the secure computing by the neural network can be performed with high accuracy compared to the conventional techniques of NPL 1 or NPL 2.

(Modification)

For secure computing, the processing costs of floating point numbers are large. Thus, here, consider a case of performing calculation in each of the components using fixed point numbers.

First, the accuracy required for each variable or constant is set. For example, in a case where the accuracy is α bits (α is the number of bits in the decimal point portion), secret sharing is performed for the share [[x]] as a fixed point number [[x×2^(α)]] with a base number of two. Note that the reasons why the accuracies of all variables and constants are not the same is because the range of values is different for each variable or constant, and the required accuracies are different from each other. For example, a variable representing a parameter is preferably set to higher accuracy than other variables because a parameter tends to be small. In a case where the accuracy required for the variable x representing the input data X, the variable y representing the output value Y^(N+1) of the output layer, and the variable w representing the parameter W⁰ of the input layer, the parameter W^(n) (n=1, . . . , N−1) of the n-th layer, and the parameter W^(N) of the output layer are b_(x), b_(y), and b_(w), respectively, b_(x)=8, b_(y)=14, b_(w)=20, or the like. In this way, by setting the accuracy required for each variable and constant, it is possible to efficiently calculate secure computing with large processing costs while reducing the overall number of bits as much as possible.

In a case where the processing is performed with fixed point numbers, there is a problem that the accuracy changes in performing multiplication. For example, multiplying [[x₁×2^(α)]] by [[x₂×2^(α)]] results in [[x₁x₂×2^(2α)]], resulting in a bits greater than the original value. In the calculation of the neural network, multiplication is performed for each of the layers, and thus, in a case of normally processing as is, it will cause overflow in the middle of the process. Thus, here, by using the right shift rshift, the calculation in the neural network is efficiently implemented with fixed point numbers while intentionally dropping digits and preventing overflow.

Detailed description will be given below. The operation of the secure neural network calculation system 20, which has devised to prevent overflow by multiplication, will be described (see FIG. 6), assuming that required accuracies for the variable x, the variable y, and the variable w being b_(x), b_(y), and b_(w), respectively.

At S210, the input layer calculation means 210 calculates the share [[Y¹]] according to Relationship (a-1) and Relationship (a-2).

With n=1, . . . , N−1, at S220-n, the n-th layer calculation means 220-n calculates the share [[Y^(n+1)]] according to the following equation.

[Math. 8]

[[U ^(n+1)]]←[[W ^(n)]][[Y ^(n)]]  (b-1)

└└Y ^(n+1)┘┘←Activation(└└U ^(n+1)┘┘)  (b-2)

┌┌Y ^(n+1)┐┐←rshift(┌┌Y ^(n+1) ┐┐,b _(w))  (b-3)

At S230, the output layer calculation means 230 calculates the share [[Y^(N+1)]] according to Relationship (c-1) and Relationship (c-2).

Here, the processing features of S210 to S230 described above will be described. In general, because the right shift in the secure computing is a large cost processing, the number of right shifts may be reduced as much as possible. Thus, the processing from S210 to S230 is to perform the minimum required right shift calculation (calculation of Relationship (b-3)). Because the secure batch mapping has the property of being able to arbitrarily set the domain of definition and the range of values, it is possible to adjust the number of digits as in the right shift by using the secure batch mapping. Thus, by including the effect of the right shift in the calculation of the secure batch mapping in the calculation of Relationship (c-2), it is possible to process the accuracy adjustment from 2 b_(w)+b_(x) to b_(y) in the calculation of Relationship (c-2).

The processing of S210 to S230 described above will be described below in terms of accuracy. In the calculation of Equation (a-1) and the calculation of Relationship (a-2) of the input layer calculation means 210 at S210, the accuracy is b_(w)+b_(x), and b_(w)+b_(x), respectively. In the calculation of Relationship (b-1) and the calculation of Relationship (b-2) of the first layer calculation means 220-1 at S220-1, the accuracy is 2b_(w)+b_(x), and 2b_(w)+b_(x), respectively. Thus, in a case where the calculation of Relationship (b-1) and the calculation of Relationship (b-2) of the second layer calculation means 220-2 at S220-2 are performed as is, the accuracy is 3b_(w)+b_(x), and 3b_(w)+b_(x), respectively. In a case where this is repeated, the digits increase by b_(w) each time the calculation in the intermediate layer proceeds. Thus, the n-th layer calculation means 220-n re-calculates the share [[Y^(n+1)]] according to Relationship (b-3). In this way, it is possible to solve the problem that the digits increase by b_(w) each time the calculation in the layer proceeds. Thus, the accuracy at the time that the calculation of Relationship (b-3) of the (N−1) layer calculation means 220-(N−1) has finished is b_(w)+b_(x). Then, in the calculation of Relationship (c-1) and the calculation of Relationship (c-2) of the output layer calculation means 230 at S230, the accuracy is 2b_(w)+b_(x), and b_(y), respectively.

Third Embodiment

A secure neural network learning system 30 will be described below with reference to FIGS. 9 to 11. FIG. 9 is a block diagram illustrating a configuration of the secure neural network learning system 30. The secure neural network learning system 30 includes W″ (W″ is a predetermined integer greater than or equal to 3) secure neural network learning apparatuses 300 ₁, . . . , 300 _(W″). The secure neural network learning apparatuses 300 ₁, . . . , 300 _(W″) are connected to a network 800 and are capable of communicating with each other. The network 800 may be, for example, a communication network such as the Internet or a broadcast communication path. FIG. 10 is a block diagram illustrating a configuration of the secure neural network learning apparatus 300 _(i) (1≤i≤W″). FIG. 11 is a flowchart illustrating an operation of the secure neural network learning system 30.

As illustrated in FIG. 11, the secure neural network learning apparatus 300 _(i) includes an initialization unit 305 _(i), a forward propagation calculation unit 310 _(i), a back propagation calculation unit 320 _(i), a gradient calculation unit 330 _(i), a parameter update unit 340 _(i), an end condition determination unit 350 _(i), and a recording unit 190 _(i). Each of the components of the secure neural network learning apparatus 300 excluding the recording unit 190 _(i) is configured such that operations required for the secure computing, specifically, operations required to implement functions of each of the components among at least concealment, addition, multiplication, secure batch mapping, and right shift, can be executed. Specific functional configurations for implementing individual operations in the present invention are sufficient to be configurations such that the algorithms disclosed in, for example, each of the reference NPL 1 to 6 can be executed, and these are conventional configurations, so the detailed description thereof will be omitted. The recording unit 190 _(i) is a component that records information required for processing executed by the secure neural network learning apparatus 300 _(i).

By way of a coordinate calculation by the W″ secure neural network learning apparatuses 300 _(i), the secure neural network learning system 30 implements secure computing for learning of a neural network, which is a multi-party protocol. Thus, an initialization means 305 (not illustrated) of the secure neural network learning system 30 is constituted with initialization units 305 ₁, . . . , 305 _(W″). A forward propagation calculation means 310 (not illustrated) is constituted with forward propagation calculation units 310 ₁, . . . , 310 _(W″). A back propagation calculation means 320 (not illustrated) is constituted with back propagation calculation units 320 ₁, . . . , 320 _(W″). A gradient calculation means 330 (not illustrated) is constituted with gradient calculation units 330 ₁, . . . , 330 _(W″). A parameter update means 340 (not illustrated) is constituted with parameter update units 340 ₁, . . . , 340 _(W″). An end condition determination means 350 (not illustrated) is constituted with end condition determination units 350 ₁, . . . , 350 _(W″).

The secure neural network learning system 30 updates a share [[W⁰]] of a parameter W⁰ of an input layer, a share [[W^(n)]] of a parameter W^(n) of an n-th layer [[W^(n)]] (n=1, . . . , N−1, N is an integer greater than or equal to 2), and a share [[W^(N)]] of a parameter W^(N) of an output layer by using a share [[X]] of learning data X and a share [[T]] of a learning label T. The operation of the secure neural network learning system 30 will be described below in accordance with FIG. 11. Here, the learning data X, the learning label T, an output value Y¹ of the input layer, an output value Y^(n+1) of the n-th layer (n=1, . . . , N−1), and an output value Y^(N+1) of the output layer are represented by vectors, and the parameter W⁰ of the input layer, the parameter W^(n) (n=1, . . . , N−1) of the n-th layer, and the parameter W^(N) of the output layer are represented by matrices. Note that the share [[W⁰]] of the parameter W⁰ of the input layer, the share [[W^(n)]] of the parameter W^(n) of the n-th layer (n=1, . . . , N−1), and the share [[W^(N)]] of the parameter W^(N) of the output layer in learning may be recorded in the recording unit 190, for example. Note that the operation of the secure neural network learning system 30 described here corresponds to a learning method of a neural network using a gradient descent method and a back propagation.

At S305, the initialization means 305 sets initial values of the share [[W⁰]] of the parameter W⁰ of the input layer, the share [[W^(n)]] of the parameter W^(n) of the n-th layer (n=1, . . . , N−1), and the share [[W^(N)]] of the parameter W^(N) of the output layer. The initialization means 305 appropriately set the value required for the determination of the end of learning. For example, in a case where the number of learning times is used for the determination of the end of learning, the initialization means 305 sets an initial value of a counter t representing the number of learning times (specifically, t=0) and a threshold value T representing the number of repetitive learning times. In a case where a criterion of whether or not the amount of change of the parameter has become sufficiently small is used instead of using the number of learning times for the determination of the end of learning, the initialization means 305 sets a threshold value c representing the degree of convergence.

At S310, the forward propagation calculation means 310 calculates a share [[Y¹]] of an output value Y¹ of an input layer, a share [[Y^(n+1)]] of an output value Y^(n+1) of an n-th layer [[Y^(n+1)]] (n=1, . . . , N−1), and a share [[Y^(N+1)]] of an output value Y^(N+1) of an output layer from the share [[X]] by using the share [[W^(n)]] (n=0, . . . , N). The forward propagation calculation means 310 will be described below with reference to FIGS. 12 and 13. FIG. 12 is a block diagram illustrating a configuration of the forward propagation calculation unit 310 _(i). FIG. 13 is a flowchart illustrating an operation of the forward propagation calculation unit 310. An input layer calculation means 210 (not illustrated) of the forward propagation calculation means 310 is constituted with input layer calculation units 210 ₁, . . . , 210 _(W″), an n-th layer calculation means 220-n (n=1, . . . , N−1) (not illustrated) is constituted with n-th layer calculation units 220-n ₁, . . . , 220-n _(W″), and an output layer calculation means 230 (not illustrated) is constituted with output layer calculation units 230 ₁, . . . , 230 _(W″). As illustrated in FIG. 12, the forward propagation calculation unit 310 _(i) includes an input layer calculation unit 210 _(i), an n-th layer calculation unit 220-n _(i) (n=1, . . . , N−1, N is an integer greater than or equal to 2, and N−1 represents the number of hidden layers (intermediate layers)), and an output layer calculation unit 230 _(i).

The operation of the forward propagation calculation means 310 will be described in accordance with FIG. 13.

At S210, the input layer calculation means 210 calculates the share [[Y¹]] of the output value Y¹ of the input layer from the share [[X]] by using the share [[W⁰]] of the parameter W⁰ of the input layer.

With n=1, . . . , N−1, at S220-n, the n-th layer calculation means 220-n calculates the share [[Y^(n+1)]] of the output value Y^(n+1) of the n-th layer from the share [[Y^(n)]] by using the share [[W^(n)]] of the parameter W^(n) of the n-th layer.

At S230, the output layer calculation means 230 calculates the share [[Y^(N+1)]] of the output value Y^(N+1) of the output layer from the share [[Y^(N)]] by using the share [[W^(N)]] of the parameter W^(N) of the output layer. Note that the processing in the output layer calculation means 230 is the same as that described in Second Embodiment using FIGS. 7 and 8.

At S320, the back propagation calculation means 320 calculates a share [[Z^(N+1)]] of an error Z^(N+1) in an output layer, a share [[Z^(n+1)]] of an error Z^(n+1) in an n-th layer (n=N−1, . . . , 1), and a share [[Z¹]] of an error Z¹ in an input layer from the share [[Y^(N+1)]] and the share [[T]] by using the share [[W^(n)]] (n=N, . . . , 1). The back propagation calculation means 320 calculates the share [[Z^(N+1)]], for example, according to the following equation.

[Math. 9]

[[Z ^(N+1)]]←[[Y ^(N+1)]]−[[T]]  (p-1)

The back propagation calculation means 320 calculates the share [[Z^(N)]], for example, according to the following equation.

[Math. 10]

[[Z ^(N)]]←Activation′([[U ^(N)]])∘([[Z ^(N+1)]][[W ^(N)]])  (p-2)

The back propagation calculation means 320 calculates the share [[Z^(n)]] (n=N−1, . . . , 1), for example, according to the following equation.

[Math. 11]

[[Z ^(n)]]←Activation′([[U ^(n)]])∘([[Z ^(n+1)]][[W ^(n)]])  (p-3)

At S330, the gradient calculation means 330 calculates a share [[G⁰]] of a gradient G⁰ in the input layer, a share [[G^(n)]] of the gradient G^(n) in the n-th layer (n=1, . . . , N−1), and a share [[G^(N)]] of a gradient G^(N) in the output layer from the share [[Z^(n)]] (n=1, . . . , N+1) by using the share [[X]] and the share [[Y^(n)]] (n=1, . . . , N). The gradient calculation means 330 calculates the share [[G⁰]], for example, according to the following equation.

[Math. 12]

[[G ⁰]]←[[Z ¹]][[X]]  (q-1)

The gradient calculation means 330 calculates the share [[G^(n)]] (n=1, . . . , N), for example, according to the following equation.

[Math. 13]

[[G ^(n)]]←[[Z ^(n+1)]][[Y ^(n)]]  (q-2)

At S340, the parameter update means 340 updates the share [[W^(n)]] (n=0, . . . , N) by using the share [[G^(n)]] (n=0, . . . , N). The parameter update means 340 updates the share [[W⁰]], for example, according to the following equation.

[Math. 14]

┌┌G ⁰┐┐←rshift(┌┌G ⁰ ┐┐,H)  (r-1)

[[W ⁰]]←[[W ⁰]]−[[G ⁰]]  (r-2)

The parameter update means 340 updates the share [[W^(n)]] (n=1, . . . , N−1), for example, according to the following equation.

[Math. 15]

└└G ^(n)┘┘←rshift(└└G ^(n) ┘┘,H)  (r-3)

[[W ^(n)]]←[[W ^(n)]]−[[G ^(n)]]  (r-4)

The parameter update means 340 updates the share [[W^(N)]], for example, according to the following equation.

[Math. 16]

[[G ^(N)]]←rshift([[G ^(N)]],H)  (r-5)

└└W ^(N) ┘┘←└└W ^(N) ┘┘−└└G ^(N)┘┘  (r-6)

Here H is a value defined by the following equation.

[Math. 17]

$\begin{matrix} {H = {- \left\lfloor {\log_{2}\frac{\eta}{m}} \right\rfloor}} & (2) \end{matrix}$

Here, η represents the learning rate, and m represents the batch size.

In order to efficiently calculate the multiplication of the learning rate η with the ratio η/m of the batch size m, the ratio η/m is approximated with a number 2^(−H) of a power of 2 by Equation (2). Thus, the learning rate η and the batch size m may both be set to a value of a power of 2.

This H is used to calculate the update of the share [[W^(n)]] (n=0, N), i.e., to perform rshift by H bits in Relationship (r-1), Relationship (r-3), and Relationship (r-5).

At S350, in a case where the end condition is satisfied, the end condition determination means 350 outputs the share [[W^(n)]] (n=0, . . . , N) at the time, and terminates the process. Otherwise, the secure neural network learning system 30 returns to the processing of S310. That is, the secure neural network learning system 30 repeats the processing from S310 to S350. Note that the case where the end condition is satisfied is the case where the counter t reaches the threshold value T in the case where the number of learning times is used for the determination of the end of learning. Similarly, the case where the end condition is satisfied is the case where the amount of change of the parameter is smaller than c in the case where a criterion of whether or not the amount of change of the parameter has become sufficiently small is used. In the case where the number of learning times is used for the determination of the end of learning, the end condition determination means 350 increments the value of the counter t by one.

Although the calculation in the forward propagation calculation means 310 needs to be calculated in the order of the input layer, the hidden layer, and the output layer and the calculation in the back propagation calculation means 320 needs to be calculated in the order of the output layer, the hidden layer, and the input layer, the calculation in the gradient calculation means 330 and the calculation in the parameter update means 340 can be processed in parallel for each layer, and the efficiency of the processing can be increased by processing together.

According to the embodiment of the present invention, it is possible to perform secure computing for the learning of the neural network at high speed and with high accuracy. By using the first embodiment, softmax functions can be calculated with high accuracy, so the secure computing for the learning of the neural network can be performed with high accuracy compared to the conventional techniques of NPL 1 or NPL 2.

(Modification)

Similar to the modification of the second embodiment, by setting the required accuracy for each variable or constant, and appropriately performing the operation of the right shift rshift, secure computing for the learning of the neural network can be efficiently implemented with fixed point numbers while intentionally dropping digits and preventing overflow. The operation of the secure neural network learning system 30, which has devised to prevent overflow by multiplication, will be described, as the accuracy required for the variable x representing the learning data X, the variable y representing the output value Y^(N+1) of the output layer and the learning label T, and the variable w representing the parameter W⁰ of the input layer, the parameter W^(n) (n=1, . . . , N−1) of the n-th layer, and the parameter W^(N) of the output layer being b_(x), b_(y), and b_(w), respectively. Specifically, the operations of S310 to S340 in FIG. 11 will be described.

At S310, the forward propagation calculation means 310 performs the processing of S210 to S230 described in the modification of the second embodiment.

At S320, the back propagation calculation means 320 calculates the share [[Z^(N+1)]] with Relationship (p-1). The back propagation calculation means 320 calculates the share [[Z^(N)]], for example, according to the following equation.

[Math. 18]

[[Z ^(N)]]←Activation′([[U ^(N)]])∘([[Z ^(N+1)]][[W ^(N)]])  (p-2)

[[Z ^(N)]]←rshift([[Z ^(N)]],b _(v))  (p-4)

The back propagation calculation means 320 calculates the share [[Z^(N)]] (n=N−1, . . . , 1), for example, according to the following equation.

[Math. 19]

[[Z ^(N)]]←Activation′([[U ^(n)]])∘([[Z ^(n+1)]][[W ^(n)]])  (p-3)

[[Z ^(n)]]←rshift([[Z ^(n)]],b _(w))  (p-5)

At S330, the gradient calculation means 330 calculates the share [[G⁰]] with Relationship (q-1). The gradient calculation means 330 calculates the share [[G^(n)]] (n=1, . . . , N) according to Relationship (q-2).

At S340, the parameter update means 340 calculates the share [[W⁰]], for example, according to the following equation.

[Math. 20]

[[G ⁰]]←rshift([[G ⁰]],b _(x) +H)  (r-1)′

[[W ⁰]]←[[W ⁰]]−[[G ⁰]]  (r-2)

The parameter update means 340 calculates the share [[W^(n)]] (n=1, . . . , N−1), for example, according to the following equation.

[Math. 21]

┌┌G ^(n) ┐┐←rshift(┌┌G ^(n) ┐┐,b _(w) +b _(x) +H)  (r-3)′

[[W ^(n)]]←[[W ^(n)]]−[[G ^(n)]]  (r-4)

The parameter update means 340 calculates the share [[W^(N)]], for example, according to the following equation.

[Math. 22]

[[G ^(N)]]←rshift([[G ^(N)]],b _(x) +b _(y) +H)  (r-5)′

[[W ^(N)]]←[[W ^(N)]]−[[G ^(N)]]  (r-6)

Here, in the calculation of the right shift in the parameter update means 340 (specifically, calculation of Relationship (r-1)′, Relationship (r-3)′, and Relationship (r-5)′), by performing approximation of division with the learning rate and the batch size by the right shift by H bits, and further performing the right shift for the accuracy adjustment at the same time, the operating costs of the right shift is reduced.

<Supplements>

FIG. 14 is a diagram illustrating an example of a functional configuration of a computer implementing each apparatus described above. The processing in each of the above-described apparatuses can be performed by causing a recording unit 2020 to read a program for causing a computer to function as each of the above-described apparatuses, and operating the program in a control unit 2010, an input unit 2030, an output unit 2040, and the like.

The apparatus according to the present invention includes, for example, as single hardware entities, an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, a communication unit to which a communication apparatus (for example, a communication cable) capable of communication with the outside of the hardware entity can be connected, a Central Processing Unit (CPU, which may include a cache memory, a register, and the like), a RAM or a ROM that is a memory, an external storage apparatus that is a hard disk, and a bus connected for data exchange with the input unit, the output unit, the communication unit, the CPU, the RAM, the ROM, and the external storage apparatuses. An apparatus (drive) capable of reading and writing from and to a recording medium such as a CD-ROM may be provided in the hardware entity as necessary. An example of a physical entity including such hardware resources is a general-purpose computer.

A program necessary to implement the above-described functions, data necessary for processing of this program, and the like are stored in the external storage apparatus of the hardware entity (the present invention is not limited to the external storage apparatus; for example, the program may be read out and stored in a ROM that is a dedicated storage apparatus). For example, data obtained by the processing of the program is appropriately stored in a RAM, the external storage apparatus, or the like.

In the hardware entity, each program and data necessary for the processing of each program stored in the external storage apparatus (or a ROM, for example) are read into a memory as necessary and appropriately interpreted, executed, or processed by a CPU. As a result, the CPU implements a predetermined function (each of components represented by xxx unit, xxx means, or the like).

The present invention is not limited to the above-described embodiment, and appropriate changes can be made without departing from the spirit of the present invention. The processing described in the embodiments are not only executed in time series in the described order, but also may be executed in parallel or individually according to a processing capability of an apparatus that executes the processing or as necessary.

As described above, when a processing function in the hardware entity (the apparatus of the present invention) described in the embodiment is implemented by a computer, processing content of a function that the hardware entity should have is described by a program. By executing this program using the computer, the processing function in the hardware entity is implemented on the computer.

The program in which the processing details are described can be recorded on a computer-readable recording medium. The computer-readable recording medium, for example, may be any type of medium such as a magnetic recording device, an optical disc, a magneto-optical recording medium, or a semiconductor memory. Specifically, for example, a hard disk apparatus, a flexible disk, a magnetic tape, or the like can be used as a magnetic recording apparatus, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only Memory), CD-R (Recordable)/RW (ReWritable), or the like can be used as an optical disc, a MO (Magneto-Optical disc) or the like can be used as a magneto-optical recording medium, and an EEP-ROM (Electronically Erasable and Programmable-Read Only Memory) or the like can be used as a semiconductor memory.

In addition, the program is distributed, for example, by selling, transferring, or lending a portable recording medium such as a DVD or a CD-ROM with the program recorded on it. Further, the program may be stored in a storage device of a server computer and transmitted from the server computer to another computer via a network so that the program is distributed.

For example, a computer executing the program first temporarily stores the program recorded on the portable recording medium or the program transmitted from the server computer in its own storage device. When executing the processing, the computer reads the program stored in its own storage device and executes the processing in accordance with the read program. Further, as another execution mode of this program, the computer may directly read the program from the portable recording medium and execute processing in accordance with the program, or, further, may sequentially execute the processing in accordance with the received program each time the program is transferred from the server computer to the computer. In addition, another configuration to execute the processing through a so-called application service provider (ASP) service in which processing functions are implemented just by issuing an instruction to execute the program and obtaining results without transmitting the program from the server computer to the computer may be employed. Further, the program in this mode is assumed to include information which is provided for processing of a computer and is equivalent to a program (data or the like that has characteristics of regulating processing of the computer rather than being a direct instruction to the computer).

Although the hardware entity is configured by a predetermined program being executed on the computer in the present embodiment, at least a part of the processing content of the hardware entity may be implemented in hardware. 

1. A secure softmax function calculation system for calculating a share ([[softmax (u₁)]], . . . , [[softmax (u_(J))]]) of a value (softmax (u₁), . . . , softmax (u_(J))) of a softmax function for an input vector (u₁, . . . , u_(J)) from a share ([[u₁]], . . . , [[u_(J)]]) of the input vector (u₁, . . . , u_(J)) (where J is an integer greater than or equal to 1), the secure softmax function calculation system being constituted with three or more secure softmax function calculation apparatuses, map₁ being secure batch mapping defined by a parameter (a₁, . . . , a_(K)) representing a domain of definition and a parameter (α₁, . . . , α_(K)) representing a range of values (where K is an integer greater than or equal to 2, a₁, . . . , a_(K) are real numbers that meet a₁< . . . <a_(K)) of a function exp (x), and map₂ being secure batch mapping defined by a parameter (b₁, . . . , b_(L)) representing a domain of definition and a parameter (β₁, . . . , β_(L)) representing a range of values (where L is an integer greater than or equal to 2, b₁, . . . , b_(L) are real numbers that meet b₁< . . . <b_(L)) of a function 1/x, the secure softmax function calculation system comprising: a subtraction means for calculating a share ([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]) from the share ([[u₁]], . . . , [[u_(J)]]); a first secure batch mapping calculation means for calculating map₁ (([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]))=([[α_(f(1, 1))]], [[α_(f(2, 1))]], . . . , [[α_(f(J, 1))]], [[α_(f(1, 2))]], . . . , [[α_(f(J, 2))]], . . . , [[α_(f(1, J))]], . . . , [[α_(f(J, J))]]) from the share ([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]) (where f(i, j) (1≤i, j≤K) is p where a_(p)≤u_(i)−u_(j)<a_(p+1)) to make ([[exp (u₁−u₁)]], [[exp (u₂−u₁)]], . . . , [[exp (u_(J)−u₁)]], [[exp (u₁−u₂)]], . . . , [[exp (u_(J)−u₂)]], . . . , [[exp (u₁−u_(J))]], . . . , [[exp (u_(J)−u_(J))]])=([[α_(f(1, 1))]], [[α_(f(2, 1))]], . . . , [[α_(f(J, 1))]], [[α_(f(1, 2))]], . . . , [[α_(f(J, 2))]], . . . , [[α_(f(1, J))]], . . . , [[α_(f(J, J))]]); an addition means for calculating a share ([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]) from the share ([[exp (u₁−u₁)]], [[exp (u₂−u₁)]], . . . , [[exp (u_(J)−u₁)]], [[exp (u₁−u₂)]], . . . , [[exp (u_(J)−u₂)]], . . . , [[exp (u₁−u_(J))]], . . . , [[exp (u_(J)−u_(J))]]); and a second secure batch mapping calculation means for calculating map₂ (([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]))=([[β_(g(1))]], [[β_(g(2))]], . . . , [[β_(g(J))]]) from the share ([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]) (where g(i) (1≤i≤L) is p where b_(p)≤Σ_(j=1) ^(J) exp (u_(j)−u_(i))<b_(p+1)) to make ([[softmax (u₁)]], [[softmax (u₂)]], . . . , [[softmax (u_(J))]]=([[β_(g(1))]], [[β_(g(2))]], . . . , [[β_(g(J))]]).
 2. The secure softmax function calculation system according to claim 1, wherein the parameters α₁ and α_(K) representing the range of values of the secure batch mapping map₁ are taken as α₁=0 and α_(K)=2^(b_y), respectively (where b_(y) represents the number of bits in a fractional portion of fixed point representing accuracy of an output of the softmax function).
 3. A secure softmax function calculation apparatus in a secure softmax function calculation system for calculating a share ([[softmax (u₁)]], . . . , [[softmax (u_(J))]]) of a value (softmax (u₁), . . . , softmax (u_(J))) of a softmax function for an input vector (u₁, . . . , u_(J)) from a share ([[u₁]], . . . , [[u_(J)]]) of the input vector (u₁, . . . , u_(J)) (where J is an integer greater than or equal to 1), the secure softmax function calculation system being constituted with three or more secure softmax function calculation apparatuses, map₁ being secure batch mapping defined by a parameter (a₁, . . . , a_(K)) representing a domain of definition and a parameter (α₁, . . . , α_(K)) representing a range of values (where K is an integer greater than or equal to 2, a₁, . . . , a_(K) are real numbers that meet a₁< . . . <a_(K)) of a function exp (x), and map₂ being secure batch mapping defined by a parameter (b₁, . . . , b_(L)) representing a domain of definition and a parameter (β₁, . . . , β_(L)) representing a range of values (where L is an integer greater than or equal to 2, b₁, . . . , b_(L) are real numbers that meet b₁< . . . <b_(L)) of a function 1/x, the secure softmax function calculation apparatus comprising: a subtraction means for calculating a share ([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]) from the share ([[u₁]], . . . , [[u_(J)]]); a first secure batch mapping calculation means for calculating map₁ (([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]))=([[α_(f(1, 1))]], [[α_(f(2, 1))]], . . . , [[α_(f(J, 1))]], [[α_(f(1, 2))]], . . . , [[α_(f(J, 2))]], . . . , [[α_(f(1, J))]], . . . , [[α_(f(J, J))]]) from the share ([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]) (where f(i, j) (1≤i, j≤K) is p where a_(p)≤u_(i)−u_(j)<a_(p+1)) to make ([[exp (u₁−u₁)]], [[exp (u₂−u₁)]], . . . , [[exp (u_(J)−u₁)]], [[exp (u₁−u₂)]], . . . , [[exp (u_(J)−u₂)]], . . . , [[exp (u₁−u_(J))]], . . . , [[exp (u_(J)−u_(J))]])=([[α_(f(1, 1))]], [[α_(f(2, 1))]], . . . , [[α_(f(J, 1))]], [[α_(f(1, 2))]], . . . , [[α_(f(J, 2))]], . . . , [[α_(f(1, J))]], . . . , [[α_(f(J, J))]]); an addition means for calculating a share ([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]) from the share ([[exp (u₁−u₁]], [[exp (u₂−u₁)]], . . . [[exp (u_(J)−u₁]], [[exp (u₁−u₂)]], . . . , [[exp (u_(J)−u₂)]], . . . , [[exp (u₁−u_(J))]], . . . , [[exp (u_(J)−u_(J))]]); and a second secure batch mapping calculation means for calculating map₂ (([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]))=([[β_(g(1))]], [[β_(g(2))]], . . . , [[β_(g(J))]]) from the share ([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]) (where g(i) (1≤i≤L) is p where b_(p)≤Σ_(j=1) ^(J) exp (u_(j)−u_(i))<b_(p+1)) to make ([[softmax (u₁)]], [[softmax (u₂)]], . . . , [[softmax (u_(J))]]=([[β_(g(1))]], [[β_(g(2))]], . . . , [[β_(g(J))]]).
 4. A secure softmax function calculation method in which a secure softmax function calculation system calculates a share ([[softmax (u₁)]], . . . , [[softmax (u_(J))]]) of a value (softmax (u₁), . . . , softmax (u_(J))) of a softmax function for an input vector (u₁, . . . , u_(J)) from a share ([[u₁]], . . . , [[u_(J)]]) of the input vector (u₁, . . . , u_(J)) (where J is an integer greater than or equal to 1), the secure softmax function calculation system being constituted with three or more secure softmax function calculation apparatuses, map₁ being secure batch mapping defined by a parameter (a₁, . . . , a_(K)) representing a domain of definition and a parameter (α₁, . . . , α_(K)) representing a range of values (where K is an integer greater than or equal to 2, a₁, . . . , a_(K) are real numbers that meet a₁< . . . <a_(K)) of a function exp (x), and map₂ being secure batch mapping defined by a parameter (b₁, . . . , b_(L)) representing a domain of definition and a parameter (β₁, . . . , β_(L)) representing a range of values (where L is an integer greater than or equal to 2, b₁, . . . , b_(L) are real numbers that meet b₁< . . . <b_(L)) of a function 1/x, the secure softmax function calculation method comprising: subtracting in which the secure softmax function calculation system calculates a share ([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]) from the share ([[u₁]], . . . , [[u_(J)]]); calculating in which the secure softmax function calculation system calculates map₁ (([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]))=([[α_(f(1, 1))]], [[α_(f(2, 1))]], . . . , [[α_(f(J, 1))]], [[α_(f(1, 2))]], . . . , [[α_(f(J, 2))]], . . . , [[α_(f(1, J))]], . . . , [[α_(f(J, J))]]) from the share ([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]) (where f(i, j) (1≤i, j≤K) is p where a_(p)≤u_(i)−u_(j)<a_(p+1)) to make ([[exp (u₁−u₁)]], [[exp (u₂−u₁)]], . . . , [[exp (u_(J)−u₁)]], [[exp (u₁−u₂)]], . . . , [[exp (u_(J)−u₂)]], . . . , [[exp (u₁−u_(J))]], . . . , [[exp (u_(J)−u_(J))]])=([[α_(f(1, 1))]], [[α_(f(2, 1))]], . . . , [[α_(f(J, 1))]], [[α_(f(1, 2))]], . . . , [[α_(f(J, 2))]], . . . , [[α_(f(1, J))]], . . . , [[α_(f(J, J))]]); adding in which the secure softmax function calculation system calculates a share ([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]) from the share ([[exp (u₁−u₁)]], [[exp (u₂−u₁)]], . . . , [[exp (u_(J)−u₁)]], [[exp (u₁−u₂)]], . . . , [[exp (u_(J)−u₂)]], . . . , [[exp (u₁−u_(J))]], . . . , [[exp (u_(J)−u_(J))]]); and calculating in which the secure softmax function calculation system calculates map₂ (([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]))=([[β_(g(1))]], [[β_(g(2))]], . . . , [[β_(g(J))]]) from the share ([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]) (where g(i) (1≤i≤L) is p where b_(p)≤Σ_(j=1) ^(J) exp (u_(j)−u_(i))<b_(p+1)) to make ([[softmax (u₁)]], [[softmax (u₂)]], . . . , [[softmax (u_(J))]]=([[β_(g(1))]], [[β_(g(2))]], . . . , [[β_(g(J))]]).
 5. A secure neural network calculation system for calculating a share [[Y^(N+1)]] of an output value Y^(N+1) (N is an integer greater than or equal to 2) for input data X from a share [[X]] of the input data X, the secure neural network calculation system being constituted with three or more secure neural network calculation apparatuses, map₁ being secure batch mapping defined by a parameter (a₁, . . . , a_(K)) representing a domain of definition and a parameter (α₁, . . . , α_(K)) representing a range of values (where K is an integer greater than or equal to 2, a₁, . . . , a_(K) are real numbers that meet a₁< . . . <a_(K)) of a function exp (x), and map₂ being secure batch mapping defined by a parameter (b₁, . . . , b_(L)) representing a domain of definition and a parameter (β₁, . . . , β_(L)) representing a range of values (where L is an integer greater than or equal to 2, b₁, . . . , b_(L) are real numbers that meet b₁< . . . <b_(L)) of a function 1/x, and the secure neural network calculation system comprising: an input layer calculation means for calculating a share [[Y¹]] of an output value Y¹ of an input layer from a share [[X]] by using a share [[W⁰]] of a parameter W⁰ of an input layer; an n-th layer calculation means for calculating a share [[Y^(n+1)]] of an output value Y^(n+1) of an n-th layer from a share [[Y^(n)]] by using a share [[W^(n)]] of a parameter W^(n) of an n-th layer, where n=1, . . . , N−1; and an output layer calculation means for calculating a share [[Y^(N+1)]] of an output value Y^(N+1) of an output layer from a share [[Y^(N)]] by using a share [[W^(N)]] of a parameter W^(N) of an output layer, wherein (u₁, . . . , u_(J)) (where J is an integer greater than or equal to 1) is an intermediate output value U^(N+1) in the output layer, and the output layer calculation means includes: a subtraction means for calculating a share ([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]) from the share ([[u₁]], . . . , [[u_(J)]]) of the intermediate output value U^(N+1); a first secure batch mapping calculation means for calculating map₁ (([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]))=([[α_(f(1, 1))]], [[α_(f(2, 1))]], . . . , [[α_(f(J, 1))]], [[α_(f(1, 2))]], . . . , [[α_(f(J, 2))]], . . . , [[α_(f(1, J))]], . . . , [[α_(f(J, J))]]) from the share ([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]) (where f(i, j) (1≤i, j≤K) is p where a_(p)≤u_(i)−u_(j)<a_(p+1)) to make ([[exp (u₁−u₁)]], [[exp (u₂−u₁)]], . . . , [[exp (u_(J)−u₁)]], [[exp (u₁−u₂)]], . . . , [[exp (u_(J)−u₂)]], . . . , [[exp (u₁−u_(J))]], . . . , [[exp (u_(J)−u_(J))]])=([[α_(f(1, 1))]], [[α_(f(2, 1))]], . . . , [[α_(f(J, 1))]], [[α_(f(1, 2))]], . . . , [[α_(f(J, 2))]], . . . , [[α_(f(1, J))]], . . . , [[α_(f(J, J))]]); an addition means for calculating a share ([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]) from the share ([[exp (u₁−u₁)]], [[exp (u₂−u₁)]], . . . , [[exp (u_(J)−u₁)]], [[exp (u₁−u₂)]], . . . , [[exp (u_(J)−u₂)]], . . . , [[exp (u₁−u_(J))]], . . . , [[exp (u_(J)−u_(J))]]); and a second secure batch mapping calculation means for calculating map₂ (([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]))=([[β_(g(1))]], [[β_(g(2))]], . . . , [[β_(g(J))]]) from the share ([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]) (where g(i) (1≤i≤L) is p where b_(p)≤Σ_(j=1) ^(J) exp (u_(j)−u_(i))<b_(p+1)) to make [[Y^(N+1)]]=([[softmax (u₁)]], [[softmax (u₂)]], . . . , [[softmax (u_(J))]]=([[β_(g(1))]], [[β_(g(2))]], . . . , [[β_(g(J))]]).
 6. A secure neural network learning system for updating a share [[W⁰]] of a parameter W⁰ of an input layer, a share [[W^(n)]] of a parameter W^(n) of an n-th layer (n=1, . . . , N−1, N is an integer greater than or equal to 2), and a share [[W^(n)]] of a parameter W^(N) of an output layer by using a share [[X]] of learning data X and a share [[T]] of a learning label T, the secure neural network learning system being constituted with three or more secure neural network learning apparatuses, map₁ being secure batch mapping defined by a parameter (a₁, . . . , a_(K)) representing a domain of definition and a parameter (α₁, . . . , α_(K)) representing a range of values (where K is an integer greater than or equal to 2, a₁, . . . a_(K) are real numbers that meet a₁< . . . <a_(K)) of a function exp (x), and map₂ being secure batch mapping defined by a parameter (b₁, . . . , b_(L)) representing a domain of definition and a parameter (β₁, . . . , β_(L)) representing a range of values (where L is an integer greater than or equal to 2, b₁, . . . , b_(L) are real numbers that meet b₁< . . . <b_(L)) of a function 1/x, and the secure neural network learning system comprising: a forward propagation calculation means for calculating a share [[Y¹]] of an output value Y¹ of an input layer, a share [[Y^(n+1)]] of an output value Y^(n+1) of an n-th layer [[Y^(n+1)]] (n=1, . . . , N−1), and a share [[Y^(N+1)]] of an output value Y^(N+1) of an output layer from the share [[X]] by using the share [[W^(n)]] (n=0, . . . , N); a back propagation calculation means for calculating a share [[Z^(N+1)]] of an error Z^(N+1) in an output layer, a share [[Z^(n+1)]] of an error Z^(n+1) in an n-th layer (n=N−1, . . . , 1), and a share [[Z¹]] of an error Z¹ in an input layer from the share [[Y^(N+1)]] and the share [[T]] by using the share [[W^(n)]] (n=N, . . . , 1); a gradient calculation means for calculating a share [[G⁰]] of a gradient G⁰ in the input layer, a share [[G^(n)]] of the gradient G^(n) in the n-th layer (n=1, . . . , N−1), and a share [[G^(N)]] of a gradient G^(N) in the output layer from the share [[Z^(n)]] (n=1, . . . , N+1) by using the share [[X]] and the share [[Y^(n)]] (n=1, . . . , N); and a parameter update means for updating the share [[W^(n)]] (n=0, . . . , N) by using the share [[G^(n)]] (n=0, . . . , N), wherein (u₁, . . . , u_(J)) (where J is an integer greater than or equal to 1) is an intermediate output value U^(N+1) in the output layer, and the forward propagation calculation means includes: a subtraction means for calculating a share ([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]) from the share ([[u₁]], . . . , [[u_(J)]]) of the intermediate output value U^(N+1); a first secure batch mapping calculation means for calculating map₁ (([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]))=([[α_(f(1, 1))]], [[α_(f(2, 1))]], . . . , [[α_(f(J, 1))]], [[α_(f(1, 2))]], . . . , [[α_(f(J, 2))]], . . . , [[α_(f(1, J))]], . . . , [[α_(f(J, J))]]) from the share ([[u₁−u₁]], [[u₂−u₁]], . . . , [[u_(J)−u₁]], [[u₁−u₂]], . . . , [[u_(J)−u₂]], . . . , [[u₁−u_(J)]], . . . , [[u_(J)−u_(J)]]) (where f(i, j) (1≤i, j≤K) is p where a_(p)≤u_(i)−u_(j)<a_(p+1)) to make ([[exp (u₁−u₁)]], [[exp (u₂−u₁)]], . . . , [[exp (u_(J)−u₁)]], [[exp (u₁−u₂)]], . . . , [[exp (u_(J)−u₂)]], . . . , [[exp (u₁−u_(J))]], . . . , [[exp (u_(J)−u_(J))]])=([[α_(f(1, 1))]], [[α_(f(2, 1))]], . . . , [[α_(f(J, 1))]], [[α_(f(1, 2))]], . . . , [[α_(f(J, 2))]], . . . , [[α_(f(1, J))]], . . . , [[α_(f(J, J))]]); an addition means for calculating a share ([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]) from the share ([[exp (u₁−u₁)]], [[exp (u₂−u₁)]], . . . , [[exp (u_(J)−u₁)]], [[exp (u₁−u₂)]], . . . , [[exp (u_(J)−u₂)]], . . . , [[exp (u₁−u_(J))]], . . . , [[exp (u_(J)−u_(J))]]); and a second secure batch mapping calculation means for calculating map₂ (([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]))=([[β_(g(1))]], [[β_(g(2))]], . . . , [[β_(g(J))]]) from the share ([[Σ_(j=1) ^(J) exp (u_(j)−u₁)]], [[Σ_(j=1) ^(J) exp (u_(j)−u₂)]], . . . , [[Σ_(j=1) ^(J) exp (u_(j)−u_(J))]]) (where g(i) (1≤i≤L) is p where b_(p)≤Σ_(j=1) ^(J) exp (u_(j)−u_(i))<b_(p+1)) to make [[Y^(N+1)]]=([[softmax (u₁)]], [[softmax (u₂)]], . . . , [[softmax (u_(J))]])=([[β_(g(1))]], [[β_(g(2))]], . . . , [[β_(g(J))]]).
 7. A nontransitory computer-readable storage medium storing a program causing a computer to function as the secure softmax function calculation apparatus according to claim
 3. 